Spatial interpolation is the process of estimating values at unsampled locations based on measurements from sampled locations. This is crucial in environmental sciences where we often have point measurements (weather stations, soil samples) but need continuous surfaces for analysis.
In this tutorial, you will:
We’ll use weather station data from Denmark (DMI - Danish Meteorological Institute), focusing on Mean Annual Temperature (MAT). The data includes:
| Task | Function | Package | Key Arguments |
|---|---|---|---|
| Read CSV | read.csv() |
base | url/file |
| Create sf | st_as_sf() |
sf | data, coords, crs |
| Transform CRS | st_transform() |
sf | object, crs |
| Create raster | rast() |
terra | object, res |
| Mask raster | mask() |
terra | x, mask |
| IDW | idw() |
gstat | formula, locations, newdata, nmax, idp |
| TPS | Tps() |
fields | x, Y |
| Variogram | variogram() |
gstat | object, cutoff |
| Fit variogram | autofitVariogram() |
automap | formula, input_data, model |
| Kriging | krige() |
gstat | formula, locations, newdata, model |
| Auto kriging | autoKrige() |
automap | formula, input_data, new_data |
| Cross-validation | krige.cv() |
gstat | formula, locations, nfold, model |
| K-fold | kfold() |
dismo | x, k |
Geometry Errors: If you encounter errors like “Loop
is not valid” or “duplicate vertex”, use st_make_valid() to
fix invalid geometries in spatial objects before operations like
cropping or intersecting.
# Spatial data handling
library(sf) # Simple features for vector data
library(terra) # Raster/vector operations
library(gstat) # Spatial statistics and Kriging
# Data manipulation and visualization
library(dplyr) # Data wrangling
library(ggplot2) # Visualization
library(tidyterra) # Visualizing rasters with ggplot2
# Mapping
library(rnaturalearth) # Natural Earth map data
# Specialized interpolation
library(fields) # Thin plate splines
library(dismo) # Cross-validation
library(sp) # Spatial classes (older format)
library(automap) # Automatic variogram fitting
library(raster) # Convert between raster formatsHINT: Packages are: sf,
terra, gstat, dplyr,
ggplot2, tidyterra,
rnaturalearth, fields, dismo,
sp, automap, raster
# Load weather station data from online source
DMI.Stations <- read.csv("https://raw.githubusercontent.com/AlejoOrdonez/StaGeoMod2021/main/StationsWithClim.csv")
# View variable names
names(DMI.Stations)## [1] "X" "StationId" "Name"
## [4] "Country" "Owner" "StationType"
## [7] "Status" "StationHeightabvMSL.m." "BarometerHeight..m."
## [10] "Latitude" "Longitude" "Region"
## [13] "Operation.From" "Country.Code" "StationId.1"
## [16] "MAT.C" "AnnPrec.mm.yr"
## [1] 211 17
## X StationId Name Country Owner StationType Status
## 1 1 6141 Abed DNK DMI Synop Active
## 2 2 6079 Anholt Havn DNK DMI Synop Active
## 3 3 6123 Assens/Torø DNK DMI Synop Active
## 4 4 6081 Blåvandshuk Fyr DNK DMI Synop Active
## 5 5 6082 Borris DNK DMI Synop Active
## 6 6 6154 Brandelev DNK DMI Synop Active
## StationHeightabvMSL.m. BarometerHeight..m. Latitude Longitude Region
## 1 7.00 9.12 54.8275 11.3292 6
## 2 2.36 3.55 56.7169 11.5098 6
## 3 1.65 3.39 55.2444 9.8882 6
## 4 13.00 17.62 55.5575 8.0828 6
## 5 25.00 NA 55.9591 8.6242 6
## 6 44.52 46.11 55.2075 11.8605 6
## Operation.From Country.Code StationId.1 MAT.C AnnPrec.mm.yr
## 1 31/08/1999 6080 6141 8.8 582
## 2 20/05/1993 6080 6079 8.5 570
## 3 08/05/2001 6080 6123 8.8 680
## 4 18/09/1980 6080 6081 9.1 757
## 5 17/04/2002 6080 6082 8.6 899
## 6 21/01/2004 6080 6154 8.5 648
TASK: Fill in functions to read CSV from URL, view names, dimensions, and first rows.
# Convert data frame to sf POINT object
DMI.Stations.Shp <- st_as_sf(DMI.Stations,
coords = c("Longitude", "Latitude"),
crs = 4326) # WGS84 coordinate system
# Check the object
class(DMI.Stations.Shp)## [1] "sf" "data.frame"
TASK: Fill in the function to create sf objects, its arguments (data, coords, crs), and coordinate column names.
# Load world map
wrld_simpl <- ne_countries(type = "countries", scale = "large")
# Fix invalid geometries (important!)
wrld_simpl <- st_make_valid(wrld_simpl)
# Plot all stations with world map
ggplot() +
geom_sf(data = wrld_simpl, fill = "gray80", color = "white") +
geom_sf(data = DMI.Stations.Shp, color = "blue", size = 2) +
theme_minimal() +
labs(title = "DMI Weather Stations",
x = "Longitude", y = "Latitude")TASK: Fill in function to get country data and its arguments, function to fix geometries, plus geoms for sf polygons and points.
# Filter for Denmark only (ISO code: DNK)
DK.Sites <- which(DMI.Stations.Shp$Country == "DNK")
# Subset to Denmark stations
DMI.DK.Shp <- DMI.Stations.Shp[DK.Sites, ]
# View summary
summary(DMI.DK.Shp)## X StationId Name Country
## Min. : 1.0 Min. : 5005 Length:185 Length:185
## 1st Qu.: 47.0 1st Qu.: 5406 Class :character Class :character
## Median :116.0 Median : 6126 Mode :character Mode :character
## Mean :111.3 Mean :13035
## 3rd Qu.:165.0 3rd Qu.:22189
## Max. :211.0 Max. :32110
##
## Owner StationType Status
## Length:185 Length:185 Length:185
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## StationHeightabvMSL.m. BarometerHeight..m. Region Operation.From
## Min. : 1.50 Min. : 2.700 Min. :6 Length:185
## 1st Qu.: 7.00 1st Qu.: 6.138 1st Qu.:6 Class :character
## Median :18.00 Median :16.500 Median :6 Mode :character
## Mean :26.15 Mean :24.757 Mean :6
## 3rd Qu.:38.50 3rd Qu.:41.820 3rd Qu.:6
## Max. :96.00 Max. :89.200 Max. :6
## NA's :70 NA's :147 NA's :70
## Country.Code StationId.1 MAT.C AnnPrec.mm.yr
## Min. :6080 Min. :6019 Min. :7.900 Min. :523.0
## 1st Qu.:6080 1st Qu.:6074 1st Qu.:8.400 1st Qu.:602.8
## Median :6080 Median :6124 Median :8.500 Median :671.0
## Mean :6080 Mean :6118 Mean :8.494 Mean :691.7
## 3rd Qu.:6080 3rd Qu.:6158 3rd Qu.:8.700 3rd Qu.:776.8
## Max. :6080 Max. :6197 Max. :9.100 Max. :921.0
## NA's :138 NA's :138 NA's :1 NA's :1
## geometry
## POINT :185
## epsg:4326 : 0
## +proj=long...: 0
##
##
##
##
TASK: Fill in the country column name, ISO code for Denmark, the subsetting index, and summary function.
# Plot Denmark stations with zoom
ggplot() +
geom_sf(data = wrld_simpl, fill = "gray80", color = "white") +
geom_sf(data = DMI.DK.Shp, color = "blue", size = 2) +
coord_sf(xlim = c(7.9, 15.1), ylim = c(54.5, 58.0), expand = FALSE) +
theme_minimal() +
labs(title = "DMI Stations in Denmark")TASK: Fill in the function to set coordinate limits in ggplot.
# Order stations by temperature
MAT.OrderVct <- DMI.DK.Shp[order(DMI.DK.Shp$MAT.C), ]
# Create scatter plot
ggplot(MAT.OrderVct, aes(x = X, y = MAT.C)) +
geom_point() +
labs(x = "Stations",
y = "Mean Annual Temperature (°C)",
title = "Mean Annual Temperature Distribution") +
theme_minimal()TASK: Fill in the function to order data, the temperature column, and geom for points.
# Fix invalid geometries in world map
wrld_simpl <- st_make_valid(wrld_simpl)
# Create Denmark bounding box
denmark_bbox <- st_bbox( c(xmin = 7.9, xmax = 15.1,
ymin = 54.5, ymax = 58.0),
crs = st_crs(4326))
# Crop world map to Denmark
wrld_simpl_denmark <- st_crop(wrld_simpl, denmark_bbox)
# Plot temperature spatially
ggplot() +
geom_sf(data = wrld_simpl_denmark, fill = "lightgray", color = NA) +
geom_sf(data = DMI.DK.Shp, aes(color = MAT.C), size = 2) +
scale_color_gradientn(colors = hcl.colors(100, palette = "Blue-Red"),
name = "Temperature (°C)") +
theme_minimal() +
labs(title = "Mean Annual Temperature in Denmark")TASK: Fill in functions to fix invalid geometries, create bounding box, crop spatial data, and the aesthetic for color.
HINT: Use st_make_valid() to fix
geometry issues before cropping.
For accurate distance calculations, transform to a projected coordinate system (ETRS89 Lambert Azimuthal Equal-Area).
# Transform stations to projected CRS
DMI.DK.Shp_proj <- st_transform(x = DMI.DK.Shp,
crs = 3035) # EPSG code for ETRS89 LAEA
# Transform Denmark map
wrld_simpl_denmark_proj <- st_transform(wrld_simpl_denmark, 3035)
# Check the new CRS
st_crs(DMI.DK.Shp_proj)## Coordinate Reference System:
## User input: EPSG:3035
## wkt:
## PROJCRS["ETRS89-extended / LAEA Europe",
## BASEGEOGCRS["ETRS89",
## DATUM["European Terrestrial Reference System 1989",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4258]],
## CONVERSION["Europe Equal Area 2001",
## METHOD["Lambert Azimuthal Equal Area",
## ID["EPSG",9820]],
## PARAMETER["Latitude of natural origin",52,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",10,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["False easting",4321000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",3210000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["northing (Y)",north,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["easting (X)",east,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## USAGE[
## SCOPE["unknown"],
## AREA["Europe - LCC & LAEA"],
## BBOX[24.6,-35.58,84.17,44.83]],
## ID["EPSG",3035]]
TASK: Fill in the transformation function and its arguments (object, crs), and function to check CRS.
# Plot with projected coordinates
ggplot() +
geom_sf(data = wrld_simpl_denmark_proj, fill = "lightgrey", color = NA) +
geom_sf(data = DMI.DK.Shp_proj, aes(color = MAT.C), size = 2) +
scale_color_gradientn(colors = hcl.colors(100, palette = "Blue-Red"),
name = "Temperature (°C)") +
theme_minimal() +
labs(title = "Mean Annual Temperature (Projected)")# Function to calculate Root Mean Square Error
RMSE.Fnc <- function(observed, predicted) {
sqrt(mean((predicted - observed)^2, na.rm = TRUE))
}TASK: Fill in the functions: outer function (square root), inner function (mean), and subtraction components.
The null model assigns the mean temperature to all locations.
# Assign mean as prediction (null model)
DMI.DK.Shp_proj$Null.MAT <- mean(DMI.DK.Shp_proj$MAT.C, na.rm = TRUE)
# Calculate RMSE for null model
null.RMSE.MAT <- RMSE.Fnc(observed = DMI.DK.Shp_proj$MAT.C,
predicted = DMI.DK.Shp_proj$Null.MAT)
# Print null model RMSE
print(paste("Null Model RMSE:", round(null.RMSE.MAT, 3)))## [1] "Null Model RMSE: 0.268"
TASK: Fill in the mean function, RMSE function, and its arguments (observed, predicted).
Inverse Distance Weighting assumes nearby points are more similar than distant points. It weights observations by the inverse of their distance to the prediction location.
Key parameters: - nmax: Number of
nearest neighbors to use - idp: Inverse distance power (0 =
simple average, 2 = default weighting)
# Create raster template
DK.SpaGrid.1km <- rast(wrld_simpl_denmark_proj, res = 1000) # 1km resolution
# Initialize with values
values(DK.SpaGrid.1km) <- 1
# Mask ocean areas
DK.SpaGrid.1km <- mask(x = DK.SpaGrid.1km,
mask = wrld_simpl_denmark_proj)
# Convert to SpatialGridDataFrame for gstat
DK.SpaGrid.1km <- as(raster::raster(DK.SpaGrid.1km),
'SpatialGridDataFrame')TASK: Fill in functions to create raster, set values, mask, and the class name for spatial grid.
# Remove NA values
DMI.DK.Shp_proj_noNA <- DMI.DK.Shp_proj[!is.na(DMI.DK.Shp_proj$MAT.C), ]
# Perform IDW
library(gstat)
idw.gstat <- idw(formula = MAT.C ~ 1,
locations = as(DMI.DK.Shp_proj_noNA, "Spatial"),
newdata = DK.SpaGrid.1km,
nmax = 10, # Use 10 nearest neighbors
idp = 0) # Equal weighting (no distance decay)## [inverse distance weighted interpolation]
## [1] "SpatialGridDataFrame"
## attr(,"package")
## [1] "sp"
# Plot result
library(tidyterra)
ggplot() +
geom_spatraster(data = IDW.MAT.1km[[1]]) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "IDW Interpolation - Mean Annual Temperature")TASK: Fill in the temperature column, IDW function, its arguments (formula, locations, newdata, nmax, idp), rast conversion function, and geom for rasters.
# Set seed for reproducibility
set.seed(5132015)
# Create 5-fold cross-validation groups
library(dismo)
kf <- kfold(x = DMI.DK.Shp_proj, k = 5)
# Vector to store RMSE values
RMSE.IDW.MAT <- rep(NA, 5)
# Loop through folds
for (k in 1:5) {
# Create test and train sets
test <- DMI.DK.Shp_proj[kf == k, ]
train <- DMI.DK.Shp_proj[kf != k, ]
# Build gstat model
gs <- gstat(formula = MAT.C ~ 1,
locations = train[!is.na(train$MAT.C), ],
nmax = 10,
set = list(idp = 0))
# Predict for test locations
p <- predict(gs, test)
# Calculate RMSE
RMSE.IDW.MAT[k] <- RMSE.Fnc(test$MAT.C, p$var1.pred)
}## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [inverse distance weighted interpolation]
## [1] 0.18781100 0.14695099 0.11099299 0.10186106 0.09179884
## [1] "Mean RMSE: 0.128"
# Calculate improvement over null model
RMSE.IDW.MAT.Imp <- 1 - (mean(RMSE.IDW.MAT) / null.RMSE.MAT)
print(paste("Improvement over null:", round(RMSE.IDW.MAT.Imp * 100, 1), "%"))## [1] "Improvement over null: 52.3 %"
TASK: Fill in the kfold function and its arguments, fold indices, gstat function and arguments, predict function, and prediction column name.
Thin Plate Splines fit a smooth surface through data points, like bending a thin metal sheet. They minimize curvature while fitting the data, creating smooth interpolations.
# Load fields package
library(fields)
# Fit TPS model
TPS.MAT <- Tps(x = st_coordinates(DMI.DK.Shp_proj), # Coordinates matrix
Y = DMI.DK.Shp_proj$MAT.C) # Response variable## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 9.608608e-11 (eff. df= 158.65 )
# Generate predictions
TPS.MAT.Pred <- interpolate(object = raster(DK.SpaGrid.1km),
model = TPS.MAT)
# Mask ocean
TPS.MAT.Pred <- mask(TPS.MAT.Pred, wrld_simpl_denmark_proj)
# Convert to SpatRaster
TPS.MAT.1km <- rast(TPS.MAT.Pred)
# Plot result
library(tidyterra)
ggplot() +
geom_spatraster(data = TPS.MAT.1km) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "TPS Interpolation - Mean Annual Temperature")TASK: Fill in the TPS function, coordinate extraction function, its arguments (x, Y), interpolate function, and its arguments.
# Set seed
set.seed(5132015)
# Create folds
kf <- kfold(x = DMI.DK.Shp_proj, k = 5)
# Vector for RMSE
RMSE.TPS.MAT <- rep(NA, 5)
# Loop through folds
for (k in 1:5) {
test <- DMI.DK.Shp_proj[kf == k, ]
train <- DMI.DK.Shp_proj[kf != k, ]
# Fit TPS model
Tps.mod <- Tps(x = st_coordinates(train),
Y = train$MAT.C)
# Predict
p <- predict(Tps.mod, st_coordinates(test))
# Calculate RMSE
RMSE.TPS.MAT[k] <- RMSE.Fnc(test$MAT.C, p)
}## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 1.310831e-10 (eff. df= 130.15 )
## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 1.940061e-10 (eff. df= 129.2 )
## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 2.210463e-10 (eff. df= 130.1501 )
## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 1.284304e-10 (eff. df= 127.3 )
## Warning:
## Grid searches over lambda (nugget and sill variances) with minima at the endpoints:
## (GCV) Generalized Cross-Validation
## minimum at right endpoint lambda = 1.52432e-10 (eff. df= 131.1 )
## [1] 0.09009537 0.10918297 0.15433318 0.13928009 0.06007873
## [1] "Mean RMSE: 0.111"
# Calculate improvement
RMSE.TPS.MAT.Imp <- 1 - (mean(RMSE.TPS.MAT) / null.RMSE.MAT)
print(paste("Improvement over null:", round(RMSE.TPS.MAT.Imp * 100, 1), "%"))## [1] "Improvement over null: 58.7 %"
TASK: Fill in Tps function, its arguments, predict function, and coordinate extraction for test data.
Kriging uses spatial autocorrelation structure (semivariogram) to optimally weight nearby observations. It provides: - Best linear unbiased predictions - Prediction uncertainty estimates
# Convert to Spatial object
DMI.DK.Shp_proj_SP <- as(DMI.DK.Shp_proj, "Spatial")
# Check for zero-distance points
zerodist(DMI.DK.Shp_proj_SP)## [,1] [,2]
## [1,] 50 119
## [2,] 53 123
## [3,] 56 126
## [4,] 63 132
## [5,] 67 134
## [6,] 71 137
## [7,] 75 140
## [8,] 81 148
## [9,] 90 157
## [10,] 94 161
## [11,] 95 162
## [12,] 100 168
## [13,] 101 169
## [14,] 102 170
## [15,] 104 174
## [16,] 105 177
## [17,] 42 182
# Remove duplicates if any
DMI.DK.Shp_proj2 <- DMI.DK.Shp_proj_SP[-zerodist(DMI.DK.Shp_proj_SP)[, 1], ]
# Check again
zerodist(DMI.DK.Shp_proj2)## [,1] [,2]
TASK: Fill in the Spatial class name and the function to check zero distances.
# Create gstat object
MAT.gstat <- gstat(formula = MAT.C ~ 1,
locations = DMI.DK.Shp_proj_SP[!is.na(DMI.DK.Shp_proj_SP$MAT.C), ])
# Calculate sample variogram
MAT.Variog <- variogram(MAT.gstat, cutoff = 200000) # 200 km cutoff
# View variogram
head(MAT.Variog)## np dist gamma dir.hor dir.ver id
## 1 120 4580.958 0.00437500 0 0 var1
## 2 311 21739.396 0.02249196 0 0 var1
## 3 519 33041.200 0.02368015 0 0 var1
## 4 669 46696.089 0.03146487 0 0 var1
## 5 811 59989.650 0.04115290 0 0 var1
## 6 894 73625.525 0.05159396 0 0 var1
TASK: Fill in gstat function and arguments, variogram function, head function, and plot function.
# Automatic variogram fitting
library(automap)
MAT.Exp.Variog <- autofitVariogram(formula = MAT.C ~ 1,
input_data = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
model = c("Sph", "Exp", "Gau", "Nug", "Lin"))
# View fitted model
MAT.Exp.Variog## $exp_var
## np dist gamma dir.hor dir.ver id
## 1 53 739.3034 0.0006603774 0 0 var1
## 2 8 5760.3361 0.0062500000 0 0 var1
## 3 19 10282.7241 0.0144736842 0 0 var1
## 4 63 14933.1670 0.0236507937 0 0 var1
## 5 142 21371.2527 0.0228521127 0 0 var1
## 6 179 26821.9844 0.0210335196 0 0 var1
## 7 760 40485.4046 0.0286578947 0 0 var1
## 8 958 60047.1998 0.0430219207 0 0 var1
## 9 1832 85197.6066 0.0665447598 0 0 var1
## 10 1846 114625.0535 0.0848456121 0 0 var1
## 11 1718 144017.3889 0.0842287544 0 0 var1
## 12 2330 178465.6679 0.0842296137 0 0 var1
##
## $var_model
## model psill range
## 1 Nug 0.0000708596 0.0
## 2 Sph 0.0867150862 154583.3
##
## $sserr
## [1] NA
##
## attr(,"class")
## [1] "autofitVariogram" "list"
# Plot sample variogram with fitted model
plot(MAT.Variog,
MAT.Exp.Variog[[2]],
main = "Semivariogram with Fitted Model")TASK: Fill in autofitVariogram function and its arguments (formula, input_data, model), and objects to plot.
# Recreate spatial grid
DK.SpaGrid.1km <- raster(wrld_simpl_denmark_proj)
res(DK.SpaGrid.1km) <- 1000
DK.SpaGrid.1km[] <- 1
DK.SpaGrid.1km <- mask(DK.SpaGrid.1km, wrld_simpl_denmark_proj)
DK.SpaGrid.1km <- as(DK.SpaGrid.1km, 'SpatialGridDataFrame')
# Create gstat object with variogram
gstat.MAT <- gstat(formula = MAT.C ~ 1,
locations = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
model = MAT.Exp.Variog[[2]])
# Predict
Pred.gstat.MAT <- predict(object = gstat.MAT,
newdata = DK.SpaGrid.1km)## [using ordinary kriging]
# Convert to raster
Krig.MAT <- rast(Pred.gstat.MAT)
# Plot
ggplot() +
geom_spatraster(data = Krig.MAT[[1]]) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "Ordinary Kriging - Mean Annual Temperature")TASK: Fill in gstat function and arguments (formula, locations, model), predict function and arguments, and rast conversion.
# Direct kriging
Pred.krige.MAT <- krige(formula = MAT.C ~ 1,
locations = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
newdata = DK.SpaGrid.1km,
model = MAT.Exp.Variog[[2]])## [using ordinary kriging]
# Convert and plot
Krig.MAT <- rast(Pred.krige.MAT)
ggplot() +
geom_spatraster(data = Krig.MAT[[1]]) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "Kriging - Mean Annual Temperature")TASK: Fill in krige function and its arguments (formula, locations, newdata, model).
# Automatic kriging with automap
Pred.autoKrige.MAT <- autoKrige(formula = MAT.C ~ 1,
input_data = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
new_data = DK.SpaGrid.1km)## [using ordinary kriging]
# Convert and plot
Krig.MAT <- rast(Pred.autoKrige.MAT[[1]])
ggplot() +
geom_spatraster(data = Krig.MAT[[1]]) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "Auto Kriging - Mean Annual Temperature")TASK: Fill in autoKrige function and its arguments (formula, input_data, new_data).
# Kriging cross-validation
Cros.Val.MAT <- krige.cv(formula = MAT.C ~ 1,
locations = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ],
nfold = 5,
model = MAT.Exp.Variog[[2]])
# Calculate RMSE for each fold
RMSE.Kig.MAT <- sapply(1:5, function(i) {
RMSE.Fnc(
observed = DMI.DK.Shp_proj2[!is.na(DMI.DK.Shp_proj2$MAT.C), ][Cros.Val.MAT$fold == i, ]$MAT.C,
predicted = Cros.Val.MAT$var1.pred[Cros.Val.MAT$fold == i]
)
})
# Print results
print(RMSE.Kig.MAT)## [1] 0.10652107 0.13674795 0.08725594 0.09190339 0.05868823
## [1] "Mean RMSE: 0.096"
# Calculate improvement
RMSE.Kig.MAT.Imp <- 1 - (mean(RMSE.Kig.MAT) / null.RMSE.MAT)
print(paste("Improvement over null:", round(RMSE.Kig.MAT.Imp * 100, 1), "%"))## [1] "Improvement over null: 64.1 %"
TASK: Fill in krige.cv function and its arguments (formula, locations, nfold, model), and the prediction column name.
# Create comparison table
RMSE.sum <- data.frame(
Model = c("Null", "IDW", "TPS", "Kriging"),
Mean_RMSE = c(null.RMSE.MAT,
mean(RMSE.IDW.MAT),
mean(RMSE.TPS.MAT),
mean(RMSE.Kig.MAT)),
Improvement = c(0,
RMSE.IDW.MAT.Imp * 100,
RMSE.TPS.MAT.Imp * 100,
RMSE.Kig.MAT.Imp * 100)
)
# Print table
print(RMSE.sum)## Model Mean_RMSE Improvement
## 1 Null 0.26807916 0.00000
## 2 IDW 0.12788298 52.29656
## 3 TPS 0.11059407 58.74574
## 4 Kriging 0.09622331 64.10638
# Visualize comparison
ggplot(RMSE.sum[-1, ], aes(x = Model, y = Improvement)) +
geom_bar(stat = "identity", fill = "steelblue") +
theme_minimal() +
labs(title = "Model Performance Comparison",
y = "Improvement over Null Model (%)",
x = "Interpolation Method")TASK: Fill in null RMSE, RMSE vectors for each method, improvement values, and geom for bars.
Question: Which interpolation method performed best? Why do you think this is the case?
Answer space: - Best method: Kriging- Reason: Kriging incorporates the semivariogram model to more accurately predict values than the other models, as it can capture how exactly the response variable varies with distance.
# Create weights based on RMSE (inverse weighting)
RMSE.w <- data.frame(
Model = c("IDW", "TPS", "Kriging"),
MAT.RMSE = c(mean(RMSE.IDW.MAT),
mean(RMSE.TPS.MAT),
mean(RMSE.Kig.MAT))
)
# Calculate weights (inverse proportion)
RMSE.w$w <- (1/RMSE.w$MAT.RMSE) / sum((1/RMSE.w$MAT.RMSE))
print(RMSE.w)## Model MAT.RMSE w
## 1 IDW 0.12788298 0.2869152
## 2 TPS 0.11059407 0.3317680
## 3 Kriging 0.09622331 0.3813168
TASK: Fill in mean function for each RMSE vector and sum function.
# Stack all predictions
MAT.Stack <- c(IDW.MAT.1km[[1]],
rast(TPS.MAT.Pred),
project(Krig.MAT, IDW.MAT.1km)[[1]])
# Name layers
names(MAT.Stack) <- c("IDW", "TPS", "Kriging")
print(MAT.Stack)## class : SpatRaster
## size : 397, 446, 3 (nrow, ncol, nlyr)
## resolution : 1000.927, 1000.53 (x, y)
## extent : 4200694, 4647107, 3491165, 3888375 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs
## source(s) : memory
## names : IDW, TPS, Kriging
## min values : 8.01, 7.897239, 7.904867
## max values : 8.94, 9.325737, 9.089675
TASK: Fill in function to convert to raster and function to set names.
# Weighted ensemble
MAT.Ensembl <- sum(MAT.Stack * RMSE.w$w)
# Plot ensemble
ggplot() +
geom_spatraster(data = MAT.Ensembl) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "Ensemble Prediction - Mean Annual Temperature")TASK: Fill in function to sum rasters and the stack object.
# Add ensemble to stack
MAT.Stack <- c(MAT.Stack, MAT.Ensembl)
names(MAT.Stack)[4] <- "Ensemble"
# Plot all methods
# Plot all methods
ggplot() +
geom_spatraster(data = MAT.Stack) +
facet_wrap(~lyr) +
scale_fill_viridis_c(name = "Temperature (°C)", option = "B") +
geom_sf(data = st_as_sf(wrld_simpl_denmark_proj),
fill = NA, color = "black") +
theme_minimal() +
labs(title = "All Predictions - Mean Annual Temperature")✓ Point data preparation - Loading and visualizing
station data
✓ Coordinate transformation - Converting to projected
CRS
✓ Null model - Establishing baseline performance
✓ IDW - Distance-based weighting interpolation
✓ TPS - Smooth surface fitting
✓ Kriging - Geostatistical interpolation using
semivariograms
✓ Cross-validation - Assessing model performance
✓ Model comparison - Selecting optimal method
✓ Ensemble modeling - Combining multiple methods
| Method | Pros | Cons | Best For |
|---|---|---|---|
| IDW | Simple, fast | Doesn’t smooth, edge effects | Quick estimates |
| TPS | Smooth surfaces | Can oversmooth | Gradual transitions |
| Kriging | Optimal weights, uncertainty | Complex, needs variogram | High accuracy needed |
| Ensemble | Robust, balanced | Complex | Critical applications |
Use IDW when: - Need quick results - Data is relatively uniform - Computational resources limited
Use TPS when: - Want smooth surfaces - Gradual spatial patterns - No strong directional trends
Use Kriging when: - Need highest accuracy - Have sufficient data points - Want uncertainty estimates - Spatial autocorrelation is important
Use Ensemble when: - Maximum reliability needed - Multiple methods perform similarly - Want to hedge against method-specific biases
Repeat the analysis using Annual Precipitation instead of Mean Annual Temperature. Which method works best? Is it different from temperature?
Try different grid resolutions (500m, 2000m, 5000m). How does resolution affect: - Computational time? - Visual appearance? - Cross-validation RMSE?
In the Kriging section, try fitting different variogram models
manually: - Spherical - Exponential
- Gaussian
Which provides the best fit?
Bivand, R.S., Pebesma, E., & Gomez-Rubio, V. (2013). Applied Spatial Data Analysis with R. Springer.
Hengl, T. (2009). A Practical Guide to Geostatistical Mapping. University of Amsterdam.
Webster, R., & Oliver, M.A. (2007). Geostatistics for Environmental Scientists. Wiley.